Using Rule Sets to Maximize ROC Performance
نویسنده
چکیده
Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed, or error costs are unequal, an accuracy maximizing rule set can perform poorly. A more flexible use of a rule set is to produce instance scores indicating the likelihood that an instance belongs to a given class. With such an ability, we can apply rulesets effectively when distributions are skewed or error costs are unequal. This paper empirically investigates different strategies for evaluating rule sets when the goal is to maximize the scoring (ROC) performance.
منابع مشابه
AMORI: A Metric-Based One Rule Inducer
The requirements of real-world data mining problems vary extensively. It is plausible to assume that some of these requirements can be expressed as application-specific performance metrics. An algorithm that is designed to maximize performance given a certain learning metric may not produce the best possible result according to these applicationspecific metrics. We have implemented A Metric-bas...
متن کاملUSING DISTRIBUTION OF DATA TO ENHANCE PERFORMANCE OF FUZZY CLASSIFICATION SYSTEMS
This paper considers the automatic design of fuzzy rule-basedclassification systems based on labeled data. The classification performance andinterpretability are of major importance in these systems. In this paper, weutilize the distribution of training patterns in decision subspace of each fuzzyrule to improve its initially assigned certainty grade (i.e. rule weight). Ourapproach uses a punish...
متن کاملMaximizing the Area under the ROC Curve using Incremental Reduced Error Pruning
The use of incremental reduced error pruning for maximizing the area under the ROC curve (AUC) instead of accuracy is investigated. A commonly used accuracy-based exclusion criterion is shown to include rules that result in concave ROC curves as well as to exclude rules that result in convex ROC curves. A previously proposed exclusion criterion for unordered rule sets, based on the lift, is on ...
متن کاملBoosting First-Order Clauses for Large, Skewed Data Sets
Creating an e ective ensemble of clauses for large, skewed data sets requires nding a diverse, high-scoring set of clauses and then combining them in such a way as to maximize predictive performance. We have adapted the RankBoost algorithm in order to maximize area under the recall-precision curve, a much better metric when working with highly skewed data sets than ROC curves. We have also expl...
متن کاملConvex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance
Receiver operating characteristic (ROC) is usually used to analyse the performance of classifiers in data mining. An important ROC analysis topic is ROC convex hull(ROCCH), which is the least convex majorant (LCM) of the empirical ROC curve, and covers potential optima for the given set of classifiers. Generally, ROC performance maximization could be considered to maximize the ROCCH, which also...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001